Xxl @ Inex 2003

نویسندگان

  • Ralf Schenkel
  • Anja Theobald
  • Gerhard Weikum
چکیده

Information retrieval on XML combines retrieval on content data (element and attribute values) with retrieval on structural data (element and attribute names). Standard query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. Such search conditions consist of regular path expressions including wildcards for paths of arbitrary length and boolean content conditions. We developed a flexible XML search language called XXL for probabilistic ranked retrieval on XML data. XXL offers a special operator ’∼’ for specifying semantic similarity search conditions on element names as well as element values. Ontological knowledge and appropriate index structures are necessary for semantic similarity search on XML data extracted from the Web, intranets or other document collections. The XXL Search Engine is a Java–based prototype implementation that support probabilistic ranked retrieval on a large corpus of XML data. This paper outlines the architecture of the XXL system and discusses its performance in the INEX benchmark.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query Refinement by Relevance Feedback in an XML Retrieval System

In recent years, ranked retrieval systems for heterogeneous XML data with both structural search conditions and keyword conditions have been developed for digital libraries, federations of scientific data repositories, and hopefully portions of the ultimate Web. These systems, such as XXL [2], are based on pre-defined similarity measures for atomic conditions (using index structures on contents...

متن کامل

A Status Report on XXL - a Software Infrastructure for Efficient Query Processing

XXL is a Java library that contains a rich infrastructure for implementing advanced query processing functionality. The library offers low-level components like access to raw disks as well as high-level ones like a query optimizer. On the intermediate levels, XXL provides a demand-driven cursor algebra, a framework for indexing and a powerful package for supporting aggregation. The library is p...

متن کامل

Report on the INEX 2003 Workshop , Schloss Dagstuhl , 15 - 17 December 2003

The widespread use of the eXtensible Markup Language (XML), especially the increasing use of XML in scientific data repositories, digital libraries and on the web, brought about an explosion in the development of XML retrieval systems to store and access XML content [BGS03]. These retrieval systems exploit the logical structure of the documents, which is explicitly represented by the XML markup...

متن کامل

Cheshire II at INEX ’03: Component and Algorithm Fusion for XML Retrieval

This paper describes the retrieval approach that UC Berkeley used in the 2003 INEX evaluation. As in last year’s INEX, our primary approach is the combination of a probabilistic methods using a Logistic regression algorithm for estimation of document (article) relevance and/or element relevance, along with Boolean constraints. This year we also used data fusion techniques to combine results fro...

متن کامل

The University of Amsterdam at INEX 2003

This paper describes the INEX 2003 participation of the Language & Inference Technology group of the University of Amsterdam. We participated in all three of the tasks, content-only, strict contentand-structure and vague content-and-structure.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004